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Abstract: Wc consider the problem of covariance matrix estimation in the presence of latent variables. 
Under suitable conditions, it is possible to learn the marginal covariance matrix of the observed 
variables via a tractable convex program, where the concentration matrix of the observed variables 
is decomposed into a sparse matrix (representing the graphical structure of the observed variables) 
and a low rank matrix (representing the marginalization effect of latent variables). We present an 
efficient first-order method based on split Bregman to solve the convex problem. The algorithm is 
guaranteed to converge under mild conditions. We show that our algorithm is significantly faster than 
the state-of-the-art algorithm on both artificial and real-world data. Applying the algorithm to a gene 
expression data involving thousands of genes, we show that most of the correlation between observed 
variables can be explained by only a few dozen latent factors. 
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1. Introduction 

Estimating covariance matrices in the high-dimensional setting arises in many apphcations and has drawn 
considerable interest recently. Because the sample covariance matrix is typically poorly behaved in the high- 
dimensional regime, regularizing the sample covariance based on some assumptions of the underlying true 
covariance is often essential to gain robustness and stability of the estimation. 

One form of regularization that has gained popularity recently is to require the the underlying inverse 
covariance matrix to be sparse [1-4]. If the data follow a multivariate Gaussian distribution with covariance 
matrix E, the entries of the inverse covariance matrix K = Ti~^ (also known as concentration matrix or 
precision matrix) encode the information of conditional dependencies between variables: Kij = if the 
variables i and j arc conditionally independent given all others. Therefore, the sparsity regularization is 
equivalent to the assumption that most of the variable pairs in the high-dimensional setting are conditionally 
independent. 

To make the estimation problem computational tractable, one often adopts a convex relaxation of the 
sparsity constraint and uses the Hi norm to promote the sparsity of the concentration matrix [3-6]. Denote 
S" the empirical covariance. Under the maximum likelihood framework, the covariance matrix estimation 
problem is then formulated as solving the following optimization problem: 

min-logdctii: + tr(S"ii:) + A||K||i 

s.t. K > 0, (1) 

where tr denotes the trace, A is a sparsity regularization parameter, and K > ^ denotes that K is positive 
semidefinite. Due to the £i penalty term and the explicit positive definite constraint on K, the method leads 
to a sparse estimation of the concentration matrix that is guaranteed to be positive definite. The problem is 
convex and many algorithms have been proposed to solve the problem efficiently in high dimension [4, 7-9] . 

However, in many real applications only a subset of the variables are directly observed, and no additional 
information is provided on both the number of the latent variables and their relationship with the observed 
ones. For instance, in the area of functional genomics it is often the case that only mRNAs of the genes can 
be directly measured, but not the proteins, which are correlated but have no direct correspondence to the 
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mRNAs because of the prominent role of the postranscriptional regulation. Another example is the movie 
recommender system where the preference of a movie can be strongly influenced by latent factors such as 
advertisements, social environment, etc. In these and other cases, the observed variables can be densely 
correlated because of the marginalization over the unobserved hidden variables. Therefore, the sparsity 
regularization alone may fail to achieve the desire results. 

Wc consider the setting in which the hidden (Xh) and the observed variables (Xq) arc jointly Gaussian 
with covariance matrix ■ The marginal statistics of the observed variable Xq are given by the marginal 

covariance matrix Eo, which is simply a submatrix of the full covariance matrix Sjq^). Let the concentration 
matrix K(^oh) = ^(oh)' '^^^ marginal concentration matrix corresponding to the observed variables 
Xq is given by the Schur complement [10]: 

Ko^^o' =Ko-KomKh'Kh^o, (2) 

where Kq, Kq.h , and Kh are the corresponding submatrices of the full concentration matrix. Based on 
the Schur complement, it is clear that the marginal concentration matrix of the observed variables can be 
decomposed into two components: one is Kq, which specifies the conditional dependencies of the observed 
variables given both the observed and latent variables, and the other is Ko,hK^^ Kh,o, which represents the 
effect of marginalization over the hidden variables. One can now impose assumptions to the two underlying 
components separately. 

By assuming that the Kq matrix is sparse and the number of latent variables is small, the maximum 
likelihood estimation of the covariance matrix of the observed variables at the presence of latent variables 
can then be formulated as 

min - logdet(5' - L) + tr(Sg(S' - L)) + AillS-Hi + A2 tr(L) 

s.t. S'-L^O, LhO. (3) 

where we decompose ^ S — L with 5* denoting Kq and L denoting Ko,hK^^ Kh,o- Because the number 
of the hidden variables is small, L is of low rank, whose convex relaxation is the trace norm. There are two 
regularization parameters in this model: Ai regularizes the sparsity of S, and A2 regularizes the rank of L. 
Under certain regularity conditions, Chandrasekaran et al. showed that this model can consistently estimate 
the underlying model structure in the high-dimensional regime in which the number of observed/hidden 
variables grow with the number of samples of the observed variables [10]. 

The objective function in (3) is strictly convex, so a global optimal solution is guaranteed to exist and be 
unique. Finding the optimal solution in the high-dimension setting is computationally challenging due to the 
log det term appeared in the likelihood function, the trace norm, the nondifferentiability of the £1 penalty, 
and the positive semidefinite constraints. For large-scale problems, the state-of-the-art algorithm for solving 
(3) is to use the special purpose solver LogdetPPA [11] developed for log-determinant semidefinite programs. 
However, the solver LogdetPPA is designed to solve smooth problems. In order to use LogdetPPA, one has 
to reformulate (3) to a smooth problem. As a result, no optimal sparse matrix S can be generated and 
additional heuristic steps involving thresholding have to be applied in order to produce sparsity. In addition, 
LogdetPPA is not especially designed for (3). We believe much more efficient algorithms can be generated 
by considering the unique structures of the model specifically. 

The main contribution of this paper contains two aspects. First, wc present a new algorithm for solving (3) 
and show that the algorithm is significantly faster than the state-of-the-art method, especially for large-scale 
problems. The algorithm is derived by reformulating the problem and adapting the split Bregman method 
[8, 9]. We derive closed form solutions for each subproblem involved in the split Bregman iterations. Second, 
we apply the method to analyze a large-scale gene expression data, and find that the model with latent 
variables explain the data much better than the one without assuming latent variables. In addition, wc find 
that most of the correlations between genes can be explained by only a few latent factors, which provides a 
new aspect for analyzing this type of data. 

The rest of the paper is organized as follows. In Section 2, we derive a split Bregman method, called 
SBLVGG, to solve the latent variable graphical model selection problem (3). The convergence property of 
the algorithm is also given. SBLVGG consists of four update steps and each update has explicit formulas to 
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calculate. In Section 3, we illustrate the utility of our algorithm and compare its performance to LogdetPPA 
using both simulated data and gene expression data. 



2. Split Bregman method for latent variable graphical model selection 



The split Bregman method was originally proposed by Osher and coauthors to solve total variation based 
image restoration problems [12]. It was later found to be either equivalent or closely related to a number of 
other existing optimization algorithms, including Douglas- Rachford splitting [13], the alternating direction 
method of multipliers (ADMM) [12, 14, 15] and the method of multipliers [16]. Because of its fast convergence 
and the easiness of implementation, it is increasingly becoming a method of choice for solving large-scale 
sparsity recovery problems [17, 18]. Recently, it is also used to solve (1) and find it is very successful [8, 9]. 

In this section, we first reformulate the problem by introducing an auxiliary variable and then proceed 
to derive a split Bregman method to solve the reformulated problem. Here we would like to emphasize that, 
although split Bregman method has been introduced to solve graphical model problems [8, 9], we have our 
own contributions. Firstly, it is our first time to use split Bregman method to solve (3) and we introduce an 
auxiliary variable for a data fitting term instead of penalty term which has been adopted in [8, 9]. Secondly, 
Secondly, the update three hasn't been appeared in [8, 9] and we provide an explicit formula for it as well. 
Thirdly, instead of using eig (or schur) decomposition as done in previous work [8, 9], we use the LAPACK 
routine dsyevd.f (based on a divide-and-conquer strategy) to compute the full eigenvalue decomposition of 
a symmetric matrix which is essential for updating the first and third subproblcms. 



2.1. Derivation of the split bregman method for latent variable graphical model selection 

The log-likelihood term and the regularization terms in (3) arc coupled, which makes the optimization 
problem difficult to solve. However, the three terms can be decoupled if we introduce an auxiliary variable 
to transfer the coupling from the objective function to the constraints. More specially, the problem (3) is 
equivalent to the following problem 

{A,S,L) = arg nun -logdetA + ir(ES^) + Aill5|li + A2tr(L) (4) 

s.t. A = S -L 

The introduction of the new variable of A is a key step of our algorithm, which makes the problem amenable 
to a split Bregman procedure to be detailed below. Although the split Bregman method originated from 
Bregman iterations, it has been demonstrated to be equivalent to the alternating direction method of multi- 
pliers (ADMM) [14, 15, 19]. For simplicity of presentation, next we derive the split Bregman method using 
the augmented Lagrangian method [16, 20]. 

We first define an augmented Lagrangian function of (4) 

C{A,S,L,U) := - \ogdct A + tr{T.'^ A) + Xi\\S\\i+ X2tr{L) 

+tr{U{A^S + L)) + ^\\A-S + L\\l, (5) 

where J7 is a dual variable matrix corresponding to the equality constraint A = S — L, and ^ > is a 
parameter. Compared with the standard Lagrangian function, the augmented Lagrangian function has an 
extra term — 5' + L]||,, which penalizes the violation of the linear constraint A ^ S — L. 

With the definition of the augmented Lagrangian function (5), the primal problem (4) is equivalent to 

min max£(A, 5, L, [/). (6) 

A>-o,Lyo,s u 

Exchanging the order of min and max in (6) leads to the formulation of the dual problem 

max£;([/) with E(U) = min C(A,S,L,U). (7) 



Note that the gradient VE{U) can be calculated by the following [21] 



VE{U)=A{U)-S{U)+L{U), (8) 

where {A{U),S{U),L{U)) = &igTninA^^^L>o,s C{A,S,L,U). 

Applying gradient ascent on the dual problem (7) and using equation (8), we obtain the method of 
multipliers [16] to solve (4) 

f(A'^-+i,5'=+i,L'=+i) = argmin^^o,Lto,s/:(A^,i,C/'=), , . 

y^k+i ^ijk^ ^(^fc+i _ ^fc+i ^ ^fe+i)^ y ) 

Here we have used ^ as the step size of the gradient ascent. It is easy to see that the efhciency of the 
iterative algorithm (9) largely hinges on whether the first equation of (9) can be solved efficiently. Note 
that the augmented Lagrangian function C{A,S,L,U^) still contains A,S,L and can not easily be solved 
directly. But we can solve the first equation of (9) through an iterative algorithm that alternates between 
the minimization of A, S and L. The method of multipliers requires that the alternative minimization of A^ S 
and L arc run multiple times until convergence to get the solution [A*'^^ , S^^"^ , L^^^) . However, because 
the first equation of (9) represents only one step of the overall iteration, it is actually not necessary to be 
solved completely. In fact, the split Bregman method (or the alternating direction method of multipliers [14]) 
uses only one alternative iteration to get a very rough solution of (9), which leads to the following iterative 
algorithm for solving (4) after some reformulations, 

^fc+i ^ argmiuA^o - logdet(yl) + tr{AY.l) + f||^~S"^ + L'^ + ^||^, 

^fe+i = argmins Allien 1 + f -S + L^ + ^\\\, 

L^+i - argminL>-o A2tr(L) + - ^''^+1 + L + ^11^, 

jjk+i ^ jjk ^ ^(^fe+i _ + ik+iy 



2.1.1. Convergence 

The convergence of the iteration (10) can be derived from the convergence theory of the alternating direction 
method of multipliers or the convergence theory of the split Bregman method [14, 17, 22]. 

Theorem 1. Let {S^^L^) he generated by (10), and {S,L) be the unique minimizer of (4). Then, 

lim 115'^' - 511 = and lim \\L^ - L\\ = 0. 

From Theorem 1, the condition for the convergence of the iteration (10) is quite mild and even irrelevant 
to the choice of the parameter fi in the iteration (10). 



2.1.2. Explicit formulas to update A, S and L 



We first focus on the computation of the first equation of (10). Taking the derivative of the objective function 
and setting it to be zero, we get 

- A"^ + i:i + u'' + ^l{A- S'' + L^) = Q (11) 

It is a quadratic equation where the unknown is a matrix. The complexity for solving this equation is at least 
0{p^) because of the inversion involved in (11). Note that 115111 = |15^|ji and L = L-^, if is symmetric, so 
is Sq + — ^{S^ — L^). It is easy to check that the explicit form for the solution of (11) under constraint 
A ^ 0, i.e., is 
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where K'^ = fJ,{S'' — L'^) — — U'' and y/C denotes the square root of a symmetric positive definite matrix 
C. Recall that the square root of a symmetric positive definite matrix C is defined to be the matrix whose 
eigenvectors arc the same as those of C and eigenvalues arc the square root of those of C. Therefore, to get 
the update of A'^'^^, can first compute the eigenvalues and eigenvectors of K'', and then get the eigenvalues of 
A'''^-^ according to (12) by replacing the matrices by the corresponding eigenvalues. We adopt the LAPACK 
routine dsycvd.f (based on a dividc-and-conqucr strategy) to compute the full eigenvalue decomposition of 
{K^Y + 4/i/. It is about 10 times faster than eig (or schur) routine when n is larger than 500. 

For the second equation of (10), wc have made the data fitting term — S' + L'^' + ^||^ separable 

with respect to the entries of A. Thus, it is very easy to get the solution and the computational complexity 
would be 0{p^) for \\S\\i is also separable. Let 7a be a soft thresholding operator defined on matrix space 
and satisfying 

where t\{ujij) = sgn(wij) niax{0, \LOij \ — A}. Then the update of S is 

For the update of L, it can use the following Theorem. 
Theorem 2. Given a symmetric matrix X and rj > 0. Denote 

S,{X) - argmin7y<r(r) + ^WY - X\\l. 

Then Sjj{X) — Vdiag{{Xi — 1J)-^-)V'^, where Xi{i £ l,...,n) are the eigenvalues of X with V being the 
corresponding eigenvector matrix and (Ai — rj)-!- ^ max(0. A; — rj). 

Proof. Note that tr{Y) = where / is the identity matrix. Thus, argminy;_o 77<r(F) + ^\\Y — XWj^ = 

arg miny^o(^ — X + r]I, Y — X + rjl). Compute eigenvalue decomposition on matrix X and get X = VKV'^ ^ 
where VV^ = V^V = / and A is the diagonal matrix. Then 

{Y -X + rjl, Y -X + ijl) = {V^YV - (A - V^YV - (A - ry/)). 

Together with the fact that 5,, (A) ^ 0, 5^ (A) should satisfy [V^ Sri{X)V)ij = max(0, \i — if) for i = j and 
otherwise. Therefore, 5,,(A) = Vdiag{{\i - ri)+)V'^ . □ 

Using the operator Sn defined in Theorem 2, it is easy to see that 

L'-^+i ^ 5a, (S-'^+i - A'^'+i - M^^C/*^). (13) 

Here we also use the LAPACK routine dsyevd.f (based on a divide-and-conquer strategy) to compute the 
full eigenvalue decomposition of 5''"'+^ — A^^^ — i^r^U^ . Summarizing all together, we get SBLVGG to solve 
the latent variable Gaussian Graphical Model (3) as shown in Algorithm 1. 

Algorithm 1: Split Bregman method for solving Latent Variable Gaussian Graphical Model (SBLVGG) 

Initialize S^\L'-\U". 
repeat 

1) j^k + l ^ K''+^(K>-)^+4^ _l^ ^^^^^ ^fe ^ ^^_gfe _ ifc) _ S - f/fe 

2) 5*=+! = r>^{A''+'^ + L'' + tj.-'^U'') 

3) L'^+i = st, (5*+! - A'^+i - /i-if/*^) 

4) C/'=+i = [/'=+ fi[A''+'^ - S'^+i + L'^+i) 
until 

Convergence 
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3. Numerical experiments 



Next we illustrate the efficiency of the split Bregman method (SBLVGG) for solving (3) using time trials on 
artificial data as well as gene expression data. All the algorithms were implemented in Matlab and run on a 
64-bit linux desktop with Intel 13 - 3.2GHz QuadCore CPU and 8GB memory. To evaluate the performance 
of SBLVGG, we compare it with logdctPPA [11] which is state-of-art solver for (3) in large-scale case. 
LogdctPPA was originally developed for log-determinant semidefinite programs with smooth penalties. In 
order to solve (3) using LogdetPPA , we need to reformulate (3) as a smooth problem as done in [10], which 
results in the derived sparse matrix S not strictly sparse with many entries close to but not exactly 0. We 
also demonstrate that latent variable Gaussian graphical selection model (3) is better than sparse Gaussian 
graphical model (1) in terms of generalization ability using gene expression data. 

Note that the convergence of Algorithm ?? is guaranteed no matter what values of /i is used as shown in 
Theorem 1. The speed of the algorithm can, however, be influenced by the choices of fi as it would affect 
the number of iterations involved. In our implementation, we choose fi in [0.005, 0.01] for artificial data and 
[0.001,0.005] for gene expression data. 

3.1. Artificial data 

Let p — Po + Ph with p being the total number of variables in the graph, po the number of observed variables 
and Ph the number of hidden variables. The synthetic data are generated in a similar way as the one in 
Section 6.1 of [11]. First, we generate an p x p random sparse matrix W with non-zero entries drawn from 
normal distribution A^(0, 1). Then set 

C = W' '^W; C(l : Po,Po + 1 : p) = C(l : Po,Po + 1 : + 0.5 * randn{po,Ph)] 

C={C + C')/2- d^diag{C); C = maximiniC - diag{d),l), -1); 

K = B + max{-1.2 * min{eig{B)), 0.001) * eye{p); Kq A'(l : PoA ■ Po) 

KoH = K{1 : po,Po + 1 : p)\ Kho = K{Po + 1 : p, 1 : Po); 

Kh = K{po + l:p,Po + l-p);Ko = Ko - KqhK^^Kho- 




500 1000 1500 2000 2500 3000 1000 2000 3000 

Number of Observed Variables in Synthetic Data Number of Genes in Real Data 



(a) (b) 

Fig 1: (a) Comparison of CPU time curve w.r.t. number of variables p for artificial data; (b) Comparison of CPU 
time curve w.r.t. number of variables p for gene expression data 
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Table 1 

Numerical comparison at po = 3000, ph = 10 for artificial data 



(Ai.Aa) 


Method 


Obj. Value 


Rank 


Sparse Ratio 


(0.0025, 0.21) 


SBLVGG 


-5642.6678 


8 


5.56% 


lodgctPPA 


-5642.6680 


8 


99.97% 


(0.0025, 0.22) 


SBLVGG 


-5642.4894 


3 


5.58% 


lodgctPPA 


-5642.4895 


3 


99.97% 


(0.0027, 0.21) 


SBLVGG 


-5619.2744 


16 


4.14% 


lodgctPPA 


-5619.2746 


16 


99.97% 


(0.0027, 0.22) 


SBLVGG 


-5619.0194 


6 


4.17% 


lodgctPPA 


-5619.0196 


6 


99.97% 



Note that Kq is marginal precision matrix of observed variables. We generate n Gaussian random sam- 
ples from Ko, and calculates its sample covariance matrix Sq. In our numerical experiments, we set 
sparse ratio of Kq around 5%, and ph = 10. The stopping criteria for SBLVGG is specified as follows. 
Let $(A,L) = -\ogdetA + tr{AT,) + Xi\\A + L\\i + \2tr{L). We stop our algorithm if \<^{A^+^ ,L^+'^) - 
$(A^L'=^)|/max(l,|$(A'=+^L'=+l)|) < e and \\A- S + L\\f <e with e = le - 4. 

Figure 1(a) shows CPU time curve of SBLVGG and LogdetPPA with respect to the number of variable 
p for the artificial data. For each fixed p, the CPU time is averaged over 4 runs with four different (Ai, A2) 
pairs. We can see SBLVGG consistently outperform LogdetPPA. For dimension of 2500 or less, it is 3.5 
times faster on average. For dimension 3000, it is 4.5 times faster. This also shows SBLVGG scales better to 
problem size than LogdetPPA. In terms of accuracy, Table 1 summarize performance of two algorithms at 
Po = 3000, Ph = 10 in three aspects: objective value, rank of L, sparsity of S (ratio of non-zero off-diagonal 
elements). We find in terms of objective value and rank, both algorithms generate almost identical results. 
However, SBLVGG outperform LogdetPPA due to its soft- thresholding operator in Algorithm ?? for 5, 
while LogdetPPA misses this kind of operator and result in many nonzero but close to zero entries due 
to numerical error. We would like to emphasize that the results in lower dimensions are very similar to 
Po = 3000, Ph = 10. We omit the details here due space limitation. 

3.2. Gene expression data 

The gene expression data [23] contains measurements of mRNA expression level of the 6316 genes of S. 
cerevisiae (yeast) under 300 different experimental conditions. First we centralize the data and choose three 
subset of the data, 1000, 2000 and 3000 genes with highest variances. Figure 1(b) shows CPU time of SBLVGG 
and LogdetPPA with different p. We can see that SBLVGG consistently perform better than LogdetPPA: 
in 1000 dimension case, SBLVGG is 2.5 times faster, while in 2000 and 3000 dimension case, almost 3 times 



Table 2 

Numerical comparison at 3000 dimensional subset of gene expression data 



(Ai,A2) 


Algorithm 


Obj. Value 


Rank 


# Non-0 Entries 


(0.01,0.05) 


SBLVGG 
LodgctPPA 


-9793.3451 
-9793.3452 


88 
88 


34 
8997000 


(0.01,0.1) 


SBLVGG 
LodgctPPA 


-9607.8482 
-9607.8483 


60 
60 


134 
8997000 


(0.02,0.05) 


SBLVGG 
LodgctPPA 


-8096.2115 
-8096.2115 


79 
79 




8996998 


(0.02,0.1) 


SBLVGG 
LodgctPPA 


-8000.9047 
-8000.9045 


56 
56 




8997000 



faster. Tabic 2 summarize the accuracy for p = 3000 dimension case in three aspects: objective value, rank of 
L, sparsity of S (Number of non-zero off-diagonal elements) for four fixed pair of (Ai, A2). Similar to artificial 
data, SBLVGG and LogdetPPA generate identical results in terms of objective value and number of hidden 
units. However, logdetPPA suffers from the floating point problem of not being able to generate exact sparse 
matrix. On the other hand, SBLVGG is doing much better in this aspect. 
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Table 3 

Comparison of generalization ability on gene expression data at dimension of 1000 using latent variable Gaussian graphical 

model (LVGG) and sparse Gaussian graphical model (SGG) 



Exp. Number 




LVGG 




SGG 










Rank of L 


Sparsity of S 


N Loglike 


Sparsity of K 


N Loglike 


1 


48 


30 


-2191.3 


24734 


-1728.8 


2 


47 


64 


-2322.7 


28438 


-1994.1 


3 


50 


58 


-2669.9 


35198 


-2526.3 


4 


52 


64 


-2534.6 


30768 


-2282.5 


5 


48 





-2924.0 


29880 


-2841.4 


6 


51 


52 


-2707.1 


28754 


-2642.6 


7 


45 





-2873.3 


30374 


-2801.4 


8 


49 





-2765.5 


31884 


-2536.7 


9 


48 


54 


-2352.0 


29752 


-2087.2 


10 


47 





-2922.9 


29760 


-2843.5 



We also investigated generalization ability of latent variable Gaussian graphical selection model (3) versus 
sparse Gaussian graphical model (1) using this data set. A subset of the data, 1000 genes with highest 
variances, are used for this experiment. The 300 samples are randomly divided into 200 for training and 100 
for testing. Denote the negative log likelihood (up to a constant difference) 

N Loglike ^ - logdct ^ + tr(^E"), 

where S" is the empirical covariance matrix using observed sample data and A is the estimated covariance 
matrix based on model (3) or model (1). It easy to see that N Loglike is equivalent to negative Log-likelihood 
function up to some scaling. Therefore, we use NLoglike as a criteria for cross-validation or prediction. 
Regularization parameters Ai, A2 for model (3) and A for model (1) are selected by 10-fold cross validation on 
training set. Table 3 shows that latent variable Gaussian graphical selection model (3) consistently outperform 
sparse Gaussian graphical model (1) in terms of generalization ability using criteria NLoglike. We also note 
that latent variable Gaussian graphical selection model (3) tend to use moderate number of hidden units, 
and very sparse conditional correlation to explain the data. For p = 1000, it tend to predict about 50 hidden 
units, and the number of direct interconnections between observed variables are tens, and sometimes even 
0. This suggests that most of the correlations between genes observed in the mRNA measurement can be 
explained by only a small number of latent factors. Currently we only tested the generalization ability of 
latent variable Gaussian graphical selection model using NLoglike. The initial result with gene expression 
data is encouraging. Further work (model selection and validation) will be done by incorporating other prior 
information or by comparing with some known gene interactions. 

4. Discussion 

Graphical model selection in high-dimension arises in a wide range of applications. It is common that in 
many of these applications, only a subset of the variables are directly observable. Under this scenario, the 
marginal concentration matrix of the observed variables is generally not sparse due to the marginalization 
of latent variables. A computational attractive approach is to decompose the marginal concentration matrix 
into a sparse matrix and a low-rank matrix, which reveals the conditional graphical model structure in the 
observed variables as well as the number of and effect due to the hidden variables. Solving the regularized 
maximum likelihood problem is however nontrivial for large-scale problems, because of the complexity of the 
log-likelihood term, the trace norm penalty and £1 norm penalty. In this work, we propose a new approach 
based on the split Bregman method (SBLVGG) to solve it. We show that our algorithm is at least three 
times faster than the state-of-art solver for large-scale problems. 

We applied the method to analyze the expression of genes in yeast in a dataset consisting of thousands of 
genes measured over 300 different experimental conditions. It is interesting to note that the model considering 
the latent variables consistently outperforms the one without considering latent variables in term of testing 
likelihood. We also note that most of the correlations observed between mRNAs can be explained by only 
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a few dozen latent variables. The observation is consistent with the module network idea proposed in the 
genomics community. It also might suggest that the postranscriptional regulation might play more prominent 
role than previously appreciated. 
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